cient Substring Traversal with Su x Arrays
نویسندگان
چکیده
The substring traversal problem is the problem of enumerating all branching substrings appearing in a given text. Although this problem is easily solvable with the su x tree of McCreight (1976), a space e cient and practically fast solution is important. We devise a simple and e cient algorithm that simulates the traversal of the su x tree for a given text with the su x array of Manber and Meyers (1993) and Gonnet, Baeza-Yates, Snider (1992). The algorithm runs in O(n) time and 5n B bulk I/O with the su x array and an additional structure called the height array, while the naive algorithm using binary search on the su x array requires O(n 2 ) time in the worst case. The space requirement 7N bytes of our algorithm is smaller than 15N bytes of the traversal algorithm with the su x tree. A linear time algorthm for computing the height array from the su x and the height arrays is also presented. Computer experiments on real datasets showed that our traversal algorithm with the su x array is an order of magnitude faster than the naive simulation method and comparable to the traversal algorithm with the su x tree.
منابع مشابه
Optimal exact string matching based on su x arrays
Using the su x tree of a string S, decision queries of the type \Is P a substring of S?" can be answered in O(jP j) time and enumeration queries of the type \Where are all z occurrences of P in S?" can be answered inO(jP j+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the su x tree are a severe drawback. The su ...
متن کاملGeneralizations of suffix arrays to multi-dimensional matrices
We propose multi-dimensional index data structures that generalize su x arrays to square matrices and cubic matrices. Giancarlo proposed a two-dimensional index data structure, the Lsu x tree, that generalizes su x trees to square matrices. However, the construction algorithm for Lsu x trees maintains complicated data structures and uses a large amount of space. We present simple and practical ...
متن کاملConstructing Su x Arrays of Large Texts
Recently, Sadakane [12] proposes a new fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called sufx array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sor...
متن کاملA Fast Algorithm for Making Su x Arrays and for Burrows-Wheeler Transformation
We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...
متن کاملA Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation
We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...
متن کامل